1,595 research outputs found

    Document-level machine translation : ensuring translational consistency of non-local phenomena

    Get PDF
    In this thesis, we study the automatic translation of documents by taking into account cross-sentence phenomena. This document-level information is typically ignored by most of the standard state-of-the-art Machine Translation (MT) systems, which focus on translating texts processing each of their sentences in isolation. Translating each sentence without looking at its surrounding context can lead to certain types of translation errors, such as inconsistent translations for the same word or for elements in a coreference chain. We introduce methods to attend to document-level phenomena in order to avoid those errors, and thus, reach translations that properly convey the original meaning. Our research starts by identifying the translation errors related to such document-level phenomena that commonly appear in the output of state-of-the-art Statistical Machine Translation (SMT) systems. For two of those errors, namely inconsistent word translations as well as gender and number disagreements among words, we design simple and yet effective post-processing techniques to tackle and correct them. Since these techniques are applied a posteriori, they can access the whole source and target documents, and hence, they are able to perform a global analysis and improve the coherence and consistency of the translation. Nevertheless, since following such a two-pass decoding strategy is not optimal in terms of efficiency, we also focus on introducing the context-awareness during the decoding process itself. To this end, we enhance a document-oriented SMT system with distributional semantic information in the form of bilingual and monolingual word embeddings. In particular, these embeddings are used as Semantic Space Language Models (SSLMs) and as a novel feature function. The goal of the former is to promote word translations that are semantically close to their preceding context, whereas the latter promotes the lexical choice that is closest to its surrounding context, for those words that have varying translations throughout the document. In both cases, the context extends beyond sentence boundaries. Recently, the MT community has transitioned to the neural paradigm. The finalstep of our research proposes an extension of the decoding process for a Neural Machine Translation (NMT) framework, independent of the model architecture, by shallow fusing the information from a neural translation model and the context semantics enclosed in the previously studied SSLMs. The aim of this modification is to introduce the benefits of context information also into the decoding process of NMT systems, as well as to obtain an additional validation for the techniques we explored. The automatic evaluation of our approaches does not reflect significant variations. This is expected since most automatic metrics are neither context-nor semantic-aware and because the phenomena we tackle are rare, leading to few modifications with respect to the baseline translations. On the other hand, manual evaluations demonstrate the positive impact of our approaches since human evaluators tend to prefer the translations produced by our document-aware systems. Therefore, the changes introduced by our enhanced systems are important since they are related to how humans perceive translation quality for long texts.En esta tesis se estudia la traducción automática de documentos teniendo en cuenta fenómenos que ocurren entre oraciones. Típicamente, esta información a nivel de documento se ignora por la mayoría de los sistemas de Traducción Automática (MT), que se centran en traducir los textos procesando cada una de las frases que los componen de manera aislada. Traducir cada frase sin mirar al contexto que la rodea puede llevar a generar cierto tipo de errores de traducción, como pueden ser traducciones inconsistentes para la misma palabra o para elementos que aparecen en la misma cadena de correferencia. En este trabajo se presentan métodos para prestar atención a fenómenos a nivel de documento con el objetivo de evitar este tipo de errores y así llegar a generar traducciones que transmitan correctamente el significado original del texto. Nuestra investigación empieza por identificar los errores de traducción relacionados con los fenómenos a nivel de documento que aparecen de manera común en la salida de los sistemas Estadísticos del Traducción Automática (SMT). Para dos de estos errores, la traducción inconsistente de palabras, así como los desacuerdos en género y número entre palabras, diseñamos técnicas simples pero efectivas como post-procesos para tratarlos y corregirlos. Como estas técnicas se aplican a posteriori, pueden acceder a los documentos enteros tanto del origen como la traducción generada, y así son capaces de hacer un análisis global y mejorar la coherencia y la consistencia de la traducción. Sin embargo, como seguir una estrategia de traducción en dos pasos no es óptima en términos de eficiencia, también nos centramos en introducir la conciencia del contexto durante el propio proceso de generación de la traducción. Para esto, extendemos un sistema SMT orientado a documentos incluyendo información semántica distribucional en forma de word embeddings bilingües y monolingües. En particular, estos embeddings se usan como un Modelo de Lenguaje de Espacio Semántico (SSLM) y como una nueva función característica del sistema. La meta del primero es promover traducciones de palabras que sean semánticamente cercanas a su contexto precedente, mientras que la segunda quiere promover la selección léxica que es más cercana a su contexto para aquellas palabras que tienen diferentes traducciones a lo largo de un documento. En ambos casos, el contexto que se tiene en cuenta va más allá de los límites de una frase u oración. Recientemente, la comunidad MT ha hecho una transición hacia el paradigma neuronal. El paso final de nuestra investigación propone una extensión del proceso de decodificación de un sistema de Traducción Automática Neuronal (NMT), independiente de la arquitectura del modelo de traducción, aplicando la técnica de Shallow Fusion para combinar la información del modelo de traducción neuronal y la información semántica del contexto encerrada en los modelos SSLM estudiados previamente. La motivación de esta modificación está en introducir los beneficios de la información del contexto también en el proceso de decodificación de los sistemas NMT, así como también obtener una validación adicional para las técnicas que se han ido explorando a lo largo de esta tesis. La evaluación automática de nuestras propuestas no refleja variaciones significativas. Esto es un comportamiento esperado ya que la mayoría de las métricas automáticas no se diseñan para ser sensibles al contexto o a la semántica, y además los fenómenos que tratamos son escasos, llevando a pocas modificaciones con respecto a las traducciones de partida. Por otro lado, las evaluaciones manuales demuestran el impacto positivo de nuestras propuestas ya que los evaluadores humanos tienen a preferir las traducciones generadas por nuestros sistemas a nivel de documento. Entonces, los cambios introducidos por nuestros sistemas extendidos son importantes porque están relacionados con la forma en que los humanos perciben la calidad de la traducción de textos largos.Postprint (published version

    Robust Part of Speech Tagging

    Get PDF
    Generally, NLP tools use well-formed and annotated data to learn patterns by using machine learning techniques. However, in this work we will focus on the language used in an on-line platform for machine translation. In this area it is usual to have a framework such the following: a web-page which offer a service of translation between pairs of languages. The problem is that the casual users utilize the service to translate any type of text (cut and paste, single words, bad formatting, snipets, informal language, pre-traductions, etc.). Hence, in this situation we will find very often words with mistakes that make the system provides a bad translation because it is not able to understand the input.The main goal of our work is, once we have identified the problem of dealing with non-standard-input is to develop a robust PoS tagger from the SVMTagger

    Experiments on document level machine translation

    Get PDF
    Most of the current SMT systems work at sentence level. They translate a text assuming that sentences are independent, but, when one looks at a well formed document, it is clear that there exist many inter sentence relations. There is much contextual information that, unfortunately, is lost when translating sentences in an independent way. We want to improve translation coherence and cohesion using document level information. So, we are interested in develop new strategies to take advantage of context information to achieve our goal. For example, we want to approach this challenge developing postprocesses in order to try to fix a first translation obtained by an SMT system. Also we are interested in taking advantage of the document level translation framework given by the Docent decoder to implement and test some of our ideas. The analogous problem can be found regarding to automatic MT evaluation metrics because most of them are designed at sentence level so, they do not capture improvements in lexical cohesion and coherence or discourse structure. However, we will left this topic for future workPreprin

    Document-level machine translation with word vector models

    Get PDF
    In this paper we apply distributional semantic information to document-level machine translation. We train monolingual and bilingual word vector models on large corpora and we evaluate them first in a cross-lingual lexical substitution task and then on the final translation task. For translation, we incorporate the semantic information in a statistical document-level decoder (Docent), by enforcing translation choices that are semantically similar to the context. As expected, the bilingual word vector models are more appropriate for the purpose of translation. The final document-level translator incorporating the semantic model outperforms the basic Docent (without semantics) and also performs slightly over a standard sentence level SMT system in terms of ULC (the average of a set of standard automatic evaluation metrics for MT). Finally, we also present some manual analysis of the translations of some concrete documentsPeer ReviewedPostprint (published version

    Traducción automática a nivel de documento como proceso de retraducción

    Get PDF
    Most of the current Machine Translation systems are designed to translate a document sentence by sentence ignoring discourse information and producing incoherencies in the final translations. In this paper we present some document-level-oriented post-processes to improve translations' coherence and consistency. Incoherences are detected and new partial translations are proposed. The work focuses on studying two phenomena: words with inconsistent translations throughout a text and also, gender and number agreement among words. Since we deal with specific phenomena, an automatic evaluation does not reflect significant variations in the translations. However, improvements are observed through a manual evaluation.Los sistemas de Traducción Automática suelen estar diseñados para traducir un texto oración por oración ignorando la información del discurso y provocando así la aparición de incoherencias en las traducciones. En este artículo se presentan varios sistemas que detectan incoherencias a nivel de documento y proponen nuevas traducciones parciales para mejorar el nivel de cohesión y coherencia global. El estudio se centra en dos casos: palabras con traducciones inconsistentes en un texto y la concordancia de género y número entre palabras. Dado que se trata de fenómenos concretos, los cambios no se ven reflejados en una evaluación automática global pero una evaluación manual muestra mejoras en las traducciones.Supported by an FPI grant within the OpenMT2 project (TIN2009-14675-C03) from the Spanish Ministry of Science and Innovation (MICINN) and by the TACARDI project (TIN2012-38523-C02) of the Spanish Ministerio de Economía y Competitividad (MEC)

    Document-level machine translation as a re-translation process

    Get PDF
    Most of the current Machine Translation systems are designed to translate a document sentence by sentence ignoring discourse information and producing incoherencies in the final translations. In this paper we present some document-level-oriented post-processes to improve translations' coherence and consistency. Incoherences are detected and new partial translations are proposed. The work focuses on studying two phenomena: words with inconsistent translations throughout a text and also, gender and number agreement among words. Since we deal with specific phenomena, an automatic evaluation does not reflect significant variations in the translations. However, improvements are observed through a manual evaluation.Peer ReviewedPostprint (published version

    Hamiltonian facets of classical gauge theories on E-manifolds

    Get PDF
    Manifolds with boundary, with corners, b-manifolds and foliations model configuration spaces for particles moving under constraints and can be described as E-manifolds. E-manifolds were introduced in [NT01] and investigated in depth in [MS20]. In this article we explore their physical facets by extending gauge theories to the E-category. Singularities in the configuration space of a classical particle can be described in several new scenarios unveiling their Hamiltonian aspects on an E-symplectic manifold. Following the scheme inaugurated in [Wei78], we show the existence of a universal model for a particle interacting with an E-gauge field. In addition, we generalize the description of phase spaces in Yang-Mills theory as Poisson manifolds and their minimal coupling procedure, as shown in [Mon86], for base manifolds endowed with an E-structure. In particular, the reduction at coadjoint orbits and the shifting trick are extended to this framework. We show that Wong's equations, which describe the interaction of a particle with a Yang-Mills field, become Hamiltonian in the E-setting. We formulate the electromagnetic gauge in a Minkowski space relating it to the proper time foliation and we see that our main theorem describes the minimal coupling in physical models such as the compactified black hole.Preprin

    Comportamiento del consumidor en tiempos de crisis: la crisis sanitaria del Covid – 19

    Get PDF
    El objetivo de este trabajo es conocer cómo la crisis sanitaria del Covid-19 afecta alcomportamiento de compra de los consumidores, cuáles son estos nuevos hábitos deconsumo y las opiniones de los consumidores al respecto de la situación económicaLos hábitos de compra de los consumidores son estudiados de manera continuada ydesde diferentes perspectivas y ciencias. Estos hábitos están continuamente cambiandoy son diferentes dependiendo del lugar en el que los analicemos, la cultura, el ambiente,el género, la edad, el poder adquisitivo o el ingreso entre otras variables.En este contexto entran en juego variables económicas determinantes sobre todo enperiodos de crisis o incertidumbre.Para poder llegar al objetivo se ha realizado un estudio de mercado a través de unaencuesta. Los resultados obtenidos más relevantes han sido que la pandemia ha afectadoal comportamiento de los consumidores reduciendo el consumo en actividades de ocio ytiempo libre principalmente derivado de las restricciones, provocando a su vez unatendencia al ahorro. El gasto en alimentación supone el gasto principal en los hogares,incrementado como consecuencia de no comer fuera de casa cuando antes se hacía.Comparando el gasto del Black Friday del año 2020 con el año anterior y conociendocuáles eran las exceptivas de consumo en el periodo de Navidad y Rebajas se puededeterminar que mayoritariamente estos consumos se han reducido.<br /

    Biodegradation of PLA-PHBV Blend Films as Affected by the Incorporation of Different Phenolic Acids

    Full text link
    [EN] Films based on a 75:25 polylactic acid (PLA) and Poly(3-hydroxybutyrate-co-3-hydroxyvalerate) (PHBV) blend, containing 2% (w/w) of different phenolic acids (ferulic, p-coumaric or protocatechuic acid), and plasticised with 15 wt. % polyethylene glycol (PEG 1000), were obtained by melt blending and compression moulding. The disintegration and biodegradation of the film under thermophilic composting conditions was studied throughout 35 and 45 days, respectively, in order to analyse the effect of the incorporation of the antimicrobial phenolic acids into the films. Sample mass loss, thermo-degradation behaviour and visual appearance were analysed at different times of the composting period. No effect of phenolic acids was observed on the film disintegration pattern, and the films were completely disintegrated at the end of the composting period. The biodegradation analysis through the CO2 measurements revealed that PLA-PHBV blend films without phenolic acids, and with ferulic acid, completely biodegraded after 20 composting days, while p-coumaric and protocatechuic slightly retarded full biodegradation (21 and 26 days, respectively). Phenolic acids mainly extended the induction period, especially protocatechuic acid. PLA-PHBV blend films with potential antimicrobial activity could be used to preserve fresh foodstuff susceptible to microbial spoilage, with their biodegradation under composting conditions being ensured.FundingThis research was funded by Ministerio de Ciencia e Innovacion of Spain through the Project AGL2016-76699-R, PID2019-105207RB-I00, and the predoctoral research grant #BES-2017-082040.Hernandez-Garcia, E.; Vargas, M.; Chiralt Boix, MA.; González Martínez, MC. (2022). Biodegradation of PLA-PHBV Blend Films as Affected by the Incorporation of Different Phenolic Acids. Foods. 11(2):1-15. https://doi.org/10.3390/foods1102024311511

    Biodegradable Antimicrobial Films for Food Packaging: Effect of Antimicrobials on Degradation

    Full text link
    [EN] The environmental problem generated by the massive consumption of plastics makes necessary the developing of biodegradable antimicrobial materials that can extend food shelf-life without having a negative impact on the environment. The current situation regarding the availability of biodegradable food packaging materials has been analysed, as well as different studies where antimicrobial compounds have been incorporated into the polymer matrix to control the growth of pathogenic or spoilage bacteria. Thus, the antimicrobial activity of active films based on different biodegradable polymers and antimicrobial compounds has been discussed. Likewise, relevant information on biodegradation studies carried out with different biopolymers in different environments (compost, soil, aquatic), and the effect of some antimicrobials on this behavior, are reviewed. In most of the studies, no relevant effect of the incorporated antimicrobials on the degradation of the polymer were observed, but some antimicrobials can delay the process. The changes in biodegradation pattern due to the presence of the antimicrobial are attributed to its influence on the microorganism population responsible for the process. More studies are required to know the specific influence of the antimicrobial compounds on the biodegradation behavior of polymers in different environments. No studies have been carried out or marine media to this end.This research was funded by Ministerio de Ciencia e Innovacion of Spain through the Project AGL2016-76699-R, PID2019-105207RB-I00, and the predoctoral research grant #BES-2017-082040.Hernandez-Garcia, E.; Vargas, M.; González Martínez, MC.; Chiralt Boix, MA. (2021). Biodegradable Antimicrobial Films for Food Packaging: Effect of Antimicrobials on Degradation. Foods. 10(6):1-23. https://doi.org/10.3390/foods1006125612310
    • …
    corecore